MAVEQC is a flexible R-package that provides QC analysis of Saturation Genome Editing (SGE) experimental data. Available under GPL 3.0 from https://github.com/wtsi-hgi/MAVEQC
Displays QC plots and statistics for all samples for QC.
Note: expected read length is 300.
Note: the lengths of primers are deducted from the read length based on the sample sheet information. (see 2.1. Sample Sheet: quants_append_start and quants_append_end)
Pass criterion: more than 90% of reads are longer than
200 nucleotides
Stats of missing variants in the library
Pass criterion: less than 1% of expected variants are missing
Note: Unique indicates that a template sequence occurs only once in the VaLiAnT meta file. (1: Unique, 0: Not Unique)This is important as a template sequence can occur more than once depending on the mutation types applied in VaLiAnT.
Note: Table below shows all the missing variants in all the samples, so the variants may occur multiple times.
Displays the total number of reads per sample. Filtering based on 1-dimensional Kmean clustering that excludes unique sequences with low read counts.
Total Reads: the total number of raw reads
Pass criterion: more than 1,000,000 total reads
Displays the percentage of library reads vs non-library reads (ie. Reference, PAM and Unmapped) for Accepted Reads (see 2.2.3 explanation).
Pass criterion: more than 40% of accepted reads are
library reads
Note: Accepted reads are the filtered reads based on 2.2.3
Defines the mean read count per template oligo sequence (dividing the total number of library reads by the total number of library sequences).
Pass criterion: library coverage is more than 100 reads
Note: Does not show missing varaints (0 count in the libary).
Low Abundance cutoff: the green dashed line indicates the threshold
which is used to determine if the variant is low abundance (less than 5
reads)
Pass criterion: the percentage of low-abundance variants is lower than 30%
% Low Abundance: the percentage of variants below the low abundance cutoff
Note: Does not show missing varaints (0 count in the libary).
condition_Day7_vs_Day4
condition_Day15_vs_Day4
condition_Day7_vs_Day4
condition_Day15_vs_Day4
Summarising the final results, below are the cutoffs using for PASS/FAIL
| name | description |
|---|---|
| log2FoldChange | This is the initial log2FoldChange from DESeq2 using all the accepted reads |
| lfcSE | This is the initial lfcSE (log2FoldChange Standard Error) from DESeq2 using all the accepted reads |
| padj | This is the initial corrected p-value from DESeq2 using all the accepted reads |
| median control value | This is the median log2 fold change of the control variants (synonymous variants and intronic variants) |
| adj_log2FoldChange | This is the adjusted log2FoldChange calculated by deducting the median control value |
| adj_score | This is the adjusted score that is calculated from the adj_log2FoldChange divided by the lfcSE |
| adj_pval | This is the adjusted p-value derived from adj_score |
| adj_fdr | This is the adjusted FDR derived from adj_pval |
| stat | This indicates an enriched or a depleted status. adj_fdr < 0.05 & adj_log2FoldChange > 0 is enriched, adj_fdr < 0.05 & adj_log2FoldChange < 0 is depleted |